Building Substring Indices Using Sequence BDDs

نویسندگان

  • Shuhei Denzumi
  • Hiroki Arimura
  • Shin-ichi Minato
چکیده

(Abstract) There is a demand for efficient indexed-substring data structures, which can store all substrings of a given text. Suffix trees and Directed Acyclic Word Graphs (DAWGs) are examples of substring indices, but they lack operations for manipulating sets of strings. The Sequence Binary Decision Diagram (SeqBDD) data structure proposed) is a new type of Binary Decision Diagram (BDD), and represents sets of sequences. This study focuses mainly on two issues: (a) compact substring indices based on SeqBDD, called Suffix Decision Diagrams (SuffixDDs), which make it possible to represent the set of all substrings efficiently via various operations inherited from Zero-suppressed Binary Decision Diagrams (ZDDs), and (b) methods for building SuffixDD, beginning with an empty string and updating iteratively whenever a new letter is read. This paper presents an efficient algorithm for constructing a SuffixDD for a given text, together with a proof of correctness and some notes about BDD families, and discusses why the new data structure appears to have advantages over existing substring indices. An upper bound on the running time is also obtained. It is hoped that presenting this data structure to a wider audience at this time will help to promote useful discussion of the important issues.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Suffix-DDs: Substring Indices Based on Sequence BDDs for Constrained Sequence Mining

In this paper, we study an efficient index structure, called Suffix Decision Diagrams (SuffixDDs), for knowledge discovery in large sequence data. Recently, Loekito, Bailey, and Pei (KAIS, 2009) proposed a new data structure for sequence data, called Sequence Binary Decision Diagram (SeqBDD), which is an extension of Zero-suppressed Binary Decision Diagrams (ZDDs) for sequences. SuffixDD is a c...

متن کامل

Efficient Symbolic Simulation via Dynamic Scheduling, Don't Caring, and Case Splitting

Most computer-aided design frameworks rely upon building BDD representations from netlist descriptions. In this paper, we present efficient algorithms for building BDDs from netlists. First, we introduce a dynamic scheduling algorithm for building BDDs for gates of the netlist, using an efficient hybrid of depthand breadth-first traversal, and constant propagation. Second, we introduce a dynami...

متن کامل

Studies on Decision Diagrams for Efficient Manipulation of Sets and Strings

In many real-life problems, we are often faced with manipulating discrete structures. Manipulation of large discrete structures is one of the most important problems in computer science. For this purpose, a family of data structures called decision diagrams is used. The origin of the decision diagrams is binary decision diagram (BDD) proposed by Bryant in 1980s. BDD is a data structure to repre...

متن کامل

Higher-level Speciication and Veriication with Bdds

Currently, many are investigating promising veriication methods based on Boolean decision diagrams (BDDs). Using BDDs, however, requires modeling the system under veriication in terms of Boolean formulas. This modeling can be diicult and error-prone, especially when dealing with constructs like arithmetic, sequential control ow, and complex data structures. We present new techniques for automat...

متن کامل

Reconstructing Strings from Substrings

We consider an interactive approach to DNA sequencing by hybridization, where we are permitted to ask questions of the form "is s a substring of the unknown sequence S?", where s is a specific query string. We are not told where s occurs in S, nor how many times it occurs, just whether or not s a substring of S. Our goal is to determine the exact contents of S using as few queries as possible. ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010